Johannes Schauer: port bootstrap build-ordering tool report 2
A copy of this post is sent to
soc-coordination@lists.alioth.debian.org
as well as to
debian-bootstrap@lists.mister-muffin.de.
Diary
June 4
I added the first version of basenocycles.ml to git. Given an initial set of
cross built packages, it tries to compile as much as possible on the resulting
system in multiple rounds.
June 5
During June 3, I discovered and error in my program that would only come up
when using the Debian Sid package lists as the input:
Fatal error: exception Assert_failure("common/edosSolver.ml", 610, 11)
On this day, June 5, I wrote a minimal test case for this problem.
The same day, Pietro figured out that this is a bug in dose which will be
fixed in the next release.
Begin writing code to figure out how important a binary package is for the
further build process.
Try to use Depsolver.edos_install to find out what packages are needed to make
debhelper available.
Restructure basenocycles.ml, exclude source packages that already have been
built, still trouble with already existing binary packages and
Cudf.mem_installed, comment stuff better.
June 6
I wrote some crude code (only estimate, not correct, fixed later) that would
give a rough overview of how often a given binary package is directly required
as a build dependency.
Debhelper came out as the most needed package. It is architecture:all, so it
does not have to be built but it has unfulfilled runtime dependencies. To make
those packages available, 13 (actually 11, fixed later) packages have to be
compiled on Ubuntu Natty. But those packages all (except gettext) require
debhelper itself to be built. The first dependency cycle.
This dependency cycle (actually, the 12 cycles) can be broken by either cross
compiling those source packages or by making them build without debhelper. One
goal of the program is to help decide what the easier option is, but this is
not yet implemented.
To play around a bit, I created the possibility to specify a list of packages
that are additionally to the minimal set of cross compiled packages also cross
compiled. I added the 13 packages found above to the list, thus making the
binary packages they build available. This made debhelper available in the
system.
As a result, 1625 out of 3339 source packages can be built with just a minimal
build system (priority:essential packages plus build-essential) and debhelper
available.
The next package that blocks the most other source packages from being built is
cdbs. The next nine packages in that list also require cdbs so it seems to be
the next important package to be available.
Pietro's suggestions make me:
- do not open BootstrapCommon but ExtLib, Common, Algo, Debian
- do proper option parsing and logging
- use Debcudf.ignore_essential = true
- do Debcudf.init_tables (binlist@srclist)
- use @ with shorter list first
- use more List.rev_append instead of @
- use CudfAdd.who_provides to find if a package is available
June 7
Pietro and Patrick suggest that for solving the debhelper cycles, one
could provide a minimal debhelper version so that the above list of 12 packages
can be built without debhelper.
I try to figure out how to get a list of packages that are missing to make a
package installable/buildable. This functionality should be provided in dose
but I fail to find it.
June 8
Lacking a solution of the problem of June 7, I write a mail to Pietro.
I start my first graphs in ocaml using the ocamlgraph library.
The graph I generate, starts off at a binary package. For each binary package
it connects new vertices as its runtime dependencies. If a binary package is
not arch:all and also not yet otherwise compiled, its source package is also
added.
The result is a graph in which set of source packages in it will make the
initial package available, if those source packages would be cross compiled.
The graph is extended further than the source packages.
June 9
I refine a couple of functions, make univ_get_pkg_by_name return the package
with the highest version number.
I wrote a rather lengthy (1027 words) email to the list that explains my
status as of this day.
I can create graphviz dot files with ocaml, can create node and edge types and
create the graph by an imperative pattern that I saw a lot in Pietro's code.
Disjunctions are not yet handled correctly (see mail from June 8).
The graphs generated look like the following:
http://mister-muffin.de/p/8nyc.png
June 11
I write a test case which shows how CudfAdd.who_provides doesnt find
virtual packages.
Automate the process of finding the packages that, if cross compiled, would
make another package available.
Add more predicates (identifying source packages) and improve input file
reading code.
Move build_compile_rounds which compiles as many source packages as it can in
multiple rounds on a given minimal system a toplevel function and thereby
abstract it more.
Create a rudimentary text based menu to choose different actions to take for an
analysis.
Start writing an extended version of simple_dependency_graph for deeper
analysis.
Use xdot to show graphs from the text menu. Allow saving those graphs to a
file.
June 12
Move functionality from the extended version of simple_dependency_graph over to
the normal version and delete the extended version.
Add the new Closure vertex type.
Create extended_dependency_graph which is supposed to not contain single binary
package vertices but handle a package and its installation set as one vertex.
The output of extended_dependency_graph is optionally reduced to the biggest
(non degenerate) strongly connected component.
User gets the option of choosing the exploration depth.
June 13
Pietro replies to my mail from June 8 but apparently I failed to express
myself well enough in my last mail, so I rephrase my question.
Pietro replies to my email from June 11 and explains how the effect I see
is due to "a nuisance of the debian to cudf encoding". As a result I change my
code accordingly.
June 14
Another lengthy (1130 words) email to the list. I explain what was done
in the past days, what parts work and how they work. I list some rationales on
why I did things the way I did them.
The most important observation is, that after improving my code again and
again, I ended up representing the dependency cycle problem in the same (very
similar) way that Pietro suggested in the beginning. This is probably a good
discovery.
Lots of data of that email is now of only little use as of June 16, I make lots
of improvements in correctness.
As I dont have an answer to my other email to Pietro from June 13, I implement
a very crude way to get an answer to the question of what packages are missing
for a package to be available/compileable. I called it
flatten_vpkgformula_best_effort and it suffers from many faults including
disjunctions and package conflicts.
Patrick spots a problem. As a result, I make sure that at no point, the source
package of an arch:all package can be listed.
June 15
As a reply to my mail from June 13, Pietro creates a new branch in the git and
adds the code I needed to get a proper installation set.
June 16
As a result of Pietro's efforts from June 15, I make great advancements
on all fronts.
Details of the current status follow in the next section.
Results
A big leap was made on June 16 due to Pietro's great help on making me
understand how Depsolver.listcheck can be used for my purposes. My difficulties
in finding the solution myself are rooted in many parts of the dose framework
being poorly commented but Pietro did already a couple of documentation commits
whenever things were unclear for me.
Using Depsolver.listcheck makes it possible to be more distribution agnostic
and I dont have to handle vpkgs, virtual packages and constraints myself
anymore. The code also doesnt suffer anymore by wrongly analyzed dependencies
and conflicts. The only thing that is not yet taken care of, is that
Depsolver.listcheck only chooses one out of several possible installation set.
A final version should be able to take into account that a different
installation set could provide a better solution.
Overall, in comparison to two weeks ago, I can now properly build, traverse and
analyze graphs, can choose an installation set properly, understand more about
dependencies, closures, dose and ocaml in general.
Finding the importance of binary packages for building
When calculating how many source packages are depending on the
availability of a binary package I originally flattened the
pkg.Cudf.depends list twice for a rough overview. This is of course
wrong due to disjunctions and conflicts and also doesnt provide a deep
dependency inspection. The new method is to calculate an installation
set that is necessary to compile a source package for every source
package. The resulting list of binary packages is then used to find out
how often a binary package appears in an installation set.
I see three drawbacks though:
- calculating an installation set for each source package in the archive is very slow
- if X packages build depend on A then also X packages will build depend on the installation set of A, resulting in lots of duplication
- only one installation set is selected though there are many